An Exploration of Postings List Contiguity in Main-Memory Incremental Indexing
نویسندگان
چکیده
For text retrieval systems, the assumption that all data structures reside in main memory is increasingly common. In this context, we present a novel incremental inverted indexing algorithm for web-scale collections that directly constructs compressed postings lists in memory. Designing efficient in-memory algorithms requires understanding modern processor architectures: in this paper, we explore the issue of postings list contiguity. Postings lists that occupy contiguous memory regions are preferred for retrieval, but maintaining contiguity is costly in terms of speed and complexity. On the other hand, allowing discontiguous index segments simplifies index construction but decreases retrieval performance. Understanding this tradeoff is our main contribution: We show that co-locating small groups of inverted list segments yields query evaluation performance that is statistically indistinguishable from fully-contiguous postings lists. In other words, we can achieve ideal performance with a relatively small amount of effort.
منابع مشابه
Fast, Incremental Inverted Indexing in Main Memory for Web-Scale Collections
For text retrieval systems, the assumption that all data structures reside in main memory is increasingly common. In this context, we present a novel incremental inverted indexing algorithm for web-scale collections that directly constructs compressed postings lists in memory. Designing efficient in-memory algorithms requires understanding modern processor architectures and memory hierarchies: ...
متن کاملMemory Management Strategies for Single-Pass Index Construction in Text Retrieval Systems
Many text retrieval systems construct their index by accumulating postings in main memory until there is no more memory available and then creating an on-disk index from the in-memory data. When the entire text collection has been read, all on-disk indices are combined into one big index through a multiway merge process. This paper discusses several ways to arrange postings in memory and studie...
متن کاملImproved Skips for Faster Postings List Intersection
Information retrieval can be achieved through computerized processes by generating a list of relevant responses to a query. The document processor, matching function and query analyzer are the main components of an information retrieval system. Document retrieval system is fundamentally based on: Boolean, vector-space, probabilistic, and language models. In this paper, a new methodology for mat...
متن کاملImproved Skips for Faster Postings List Intersection
Information retrieval can be achieved through computerized processes by generating a list of relevant responses to a query. The document processor, matching function and query analyzer are the main components of an information retrieval system. Document retrieval system is fundamentally based on: Boolean, vector-space, probabilistic, and language models. In this paper, a new methodology for mat...
متن کاملThe Role of Mental Contiguity in Memory: Registration and Retrieval Effects
Implicit contiguity of related items whose list presentations are physically disparate results from the subject looking back through memory so as to bring the items together in mental experience. Effects of implicit contiguity were examined in three experiments by controlling looking-back behavior during list presentation and varying the separation of target items and related items that were la...
متن کامل